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The push-relabel algorithm can be used to calculate rapidly the exact ground states for a given sample with a 
random-field Ising model (RFIM) Hamiltonian. Although the algorithm is guaranteed to terminate after a time 
polynomial in the number of spins, implementation details are important for practical performance. Empirical 
results for the timing in dimensions d = 1,2, and 3 are used to determine the fastest among several implemen- 
tations. Direct visualization of the auxiliary fields used by the algorithm provides insight into its operation and 
suggests how to optimize the algorithm. Recommendations are given for further study of the RFIM. 



I. INTRODUCTION 

In systems with quenched disorder, changes in the ran- 
dom background take place over time scales that are much 
longer than the time scale for evolution of the primary de- 
grees of freedom. In magnetic systems, the spin degrees 
of freedom interact in an effectively frozen random en- 
vironment determined by substitutional disorder or vacan- 
cies. The random-field Ising model (RFIM), defined as 
ferromagnetically-coupled spins subject to a spatially vary- 
ing magnetic field, is a prototypical model for magnets with 
quenched disorder that has been studied since the 1970s (for 
reviews, see, e.g., Q]). In dimensions d > 2, the RFIM has 
a transition between ferromagnetic (FM) and paramagnetic 
(PM) states; this transition can be found by varying either 
the temperature or the disorder, at sufficiently small values 
of the remaining control parameter Fishman and Aharony 
121 mapped the RFIM with a field of random sign and fixed 
magnitude with disordered bonds to an experimentally real- 
izable system: the diluted antiferromagnet in a field (DAFF) 
Isl. At low temperatures, the glassy behavior of the DAFF 
seen in both experiment and theory leads to non-equilibrium 
effects such as history dependence and a broad range of relax- 
ation times. These observations are qualitatively consistent 
with predictions by both Fisher and Villain of an exponential 
slowing down near the critical point and of a low temperature 
phase described by the zero-temperature critical point |i4n5jj]. 

Exponential slowing down also affects optimization meth- 
ods such as simulated annealing 0|, that are modeled on the 
dynamics of the physical system. The large barriers to equi- 
libration make it very time consuming to sample configura- 
tion space accurately at finite temperatures or to find the ex- 
act zero-temperature ground states using such methods. Find- 
ing the partition function for the RFIM at finite temperature is 
NP-hard Ql^. However, the zero-temperature FM-PM tran- 
sition is expected to be in the same universality class as the 
finite temperature transition. So it is fortunate that there are 
alternate methods for quickly finding the exact RFIM ground 
state. These methods are based on a mapping from the prob- 
lem of finding the ground state of the RFIM to that of find- 
ing the maximum flow — or, equivalently, the minimum cut — 
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of the RFIM ground state properties lllll utilized the push- 
relabel (PR) algorithm for max-flow introduced by Goldberg 
and Tarjan tl2il . The PR algorithm is in practice the most ef- 
ficient algorithm available for many classes of problems ITsll . 
Although its generic implementation has a polynomial time 
bound, its actual performance depends on the order in which 
operations are performed and which heuristics are used to 
maintain auxiliary fields for the algorithm. Even within this 
polynomial time bound, there is a power-law critical slowing 
down of the PR algorithm at the zero-temperature (T ~ 0) 
transition ll L.14,...15J. 

This paper presents results that are useful for minimizing 
CPU time in RFIM ground-state simulations. We begin with a 
brief review of the RFIM, including its definition and phases, 
in Sec. HI] We then discuss the implementations for the PR 
algorithms that we study. The PR algorithm redistributes the 
random magnetic field among spins by pushing positive field 
"downhill" with respect to a potential field (the "heighf ') de- 
fined for each spin. This redistribution and coalescence of 
positive field and negative field "sinks" allows for the deter- 
mination of same-spin domains. The data structures used to 
organize the pushes and the heuristics used for updating the 
potential field are described in Sec. [TO] The results for the tim- 
ing using various heuristics, summarized in Sec. II VI should 
be useful for designing further extensive studies of the RFIM. 
Visualizing the auxiliary fields leads to a clearer explanation 
of the timing results and was important in guiding our work. 
In Sec. |V] we use these visualizations to present a qualitative 
overview of the operations of the different implementations in 
the distinct phases of the RFIM. The primary results of the pa- 
per, namely the recommended choices for the PR algorithm, 
are summarized in Sec. I VII 



II. RANDOM-FIELD ISING MODEL 

The random-field Ising model has a non-trivial ground state 
due to the competition between the feiTomagnetic interaction 
that tends to align neighboring spins and the influence of ran- 
dom fields, which tends to force spins to point in random di- 
rections. Taking the strength of the ferromagnetic interactions 
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between neighboring spins to be J and designating the local 
random fields by hi, the energy of a spin configuration is Ql 

H = -J^^SiSj -^^hiSi, (1) 

where the spin at a given site i on a c?-dimensional lattice 
with n ^ L'^ sites can take on values = ±1 and the sites 
i and j in the sum in the first term are nearest neighbors. We 
study the Gaussian RFIM, where the hi are independent vari- 
ables chosen from a Gaussian distribution with mean and 
variance A^J^, with periodic boundary conditions. The pa- 
rameter A characterizes the strength of the disorder relative to 
the ferromagnetic interaction. 

As the disorder dominates over thermal fluctuations at large 
length scales and low temperature in more than two dimen- 
sions, the ground states of this model are of interest. In dimen- 
sions d > 2, there is a zero-temperature transition between 
two phases at the critical disorder A = Ac. When A < Ac, 
the ferromagnetic interaction between nearest neighbors dom- 
inates and the spins take on a mean value m = ^ ■ Sj with 
|m| 7^ in the limit 72 oo. In the case A > Ac, randomness 
dominates and the ground state is "paramagnetic" |m| = 0, as 
n — > oo. 

In dimensions d ~ 1 and 2, the ground state is in the para- 
magnetic phase at any A > in the thermodynamic limit 
(i.e., Ac = 0). But the correlation length ^, characterizing the 
range of spin-spin correlations or the size of uniform spin do- 
mains, diverges as A~^ when o? = 1 (see, e.g., fl^ ). For sam- 
ples of size L < ^, i.e., for A < A^: L~^/^, where /S.x[L) 
is the sample-size-dependent crossover disorder, the ground 
state is essentially ferromagnetic. It has been argued that In(^) 
scales as an inverse power of A when d^2 Lllii,!!]. This 
very rapidly growing correlation length gives rise to an ap- 
parent ferromagnetic phase in c? = 2 at even moderate A^; 
in finite samples I20I1 . When we take the limit A — > in 
d = 1 and d = 2, we will be taking A <C A^^. Many of the 
results for the algorithm, such as which algorithm is fastest, 
hold independently of the existence of a true physical phase 
transition. 



III. PUSH-RELABEL ALGORITHM: DATA STRUCTURES 
AND HEURISTICS 

We now discuss the motivation for using the PR algorithm 
and outline its structure. In particular, we define the auxil- 
iary fields and basic operations that operate on those fields to 
determine the ground state. Picard and Ratliff showed that 
any quadratic optimization problem, such as the RFIM, can 
be mapped onto a min-cut/max-flow problem @|. T here are 
a number of algorithms for solving max-flow flV\, though 
Cherkassky and Goldberg's results show that the PR algorithm 
lll2il is often the best algorithm for solving large problems on 
a variety of graphs lUsIl . For detailed explanations of the map- 
ping of the RFIM to a max-flow problem and the proofs of the 
correctness of the PR algorithm, see reviews of applications of 
combinatorial optimization to statistical physics I22l |23L 12411 



and computer science texts ll2lll . In this paper, we limit our- 
selves to a description of the algorithm, neglecting proofs 
of correctness, but including a "physical" description for the 
variant of the PR algorithm we have used. 

Intuitively, the PR algorithm finds the domains of uniform 
spin in the ground state by rearranging ("pushing") the mag- 
netic field. If the bond between two spins is strong enough, 
the field on one spin can be removed from one spin and added 
to its neighbor, possibly influencing the direction of the neigh- 
boring spin. This rearrangement (and change in the bonds, as 
described below) is the push. Subsequent pushes can then 
affect distant spins. As a consequence of pushes, positive 
and negative fields originally located on separate spins can- 
cel. The cancellation leads to domains of uniform sign for 
the excess fields. This domain growth by rearrangement of 
field is limited by the strength of the nearest-neighbor bonds, 
which "carry" the rearrangement. The push operation reduces 
or removes the interactions between neighboring spins. When 
the field has large variations compared to the bond strengths, 
the magnetic field cannot be pushed very far as bonds become 
saturated and block further rearrangement. Conversely, in the 
limit of weak fields, large domains form, as the bonds fa- 
vor alignment of nearest neighbor spins: in the language of 
PR, the large capacity of the bonds relative to the strength 
of the fields allows for long-range rearrangements of the ran- 
dom field. In the limit of very weak fields, the field can be 
pushed anywhere (no bonds are saturated), so there is only 
one domain, whose orientation is simply a result of the sum 
of the random fields on all the spins. But when the field is very 
strong (A ^ 1), the rearrangement of field is limited, and, in 
most cases, the domains have single spins and the orientation 
of a spins is in the same direction as its magnetic field. 

The other basic operation is the relabel operation. This op- 
eration updates an auxiliary field, the height, defined for each 
spin. Following the more detailed constraints described in 
Sec. nil Al this height guides the pushes. Most simply put, 
pushes are always in the downhill direction. When a push is 
not possible from a given site, the relabel operation increases 
the magnitude of the height of that site. This relabelling will 
either allow a push to be executed or will identify the spin as 
having a particular sign in the ground state. 

The basic operations can be carried out using a variety of 
ordering methods. The choice of method affects the running 
time of the algorithm. A specific algorithm is defined by two 
sets of choices, which are defined and described in detail in 
the following subsections: 

1. the dynamically-determined order local operations 
(pushes and relabels), which is organized by a choice 
of data structure, and 

2. heuristic manipulations of the auxihary fields, namely, 
global updates and gap relabehng. 

We implemented the PR algorithm in Java and C++. The Java 
implementation can visualize the evolution of the auxiliary 
fields used by PR. The C-n- code relied on the original C code 
1I25I1 developed by Cherkassky and Goldberg iflsT . The codes 
can be downloaded from our web site ll26ll . 
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A. Auxiliary fields and push-relabel operations 

The PR algorithm uses three auxiUary fields to guide the 
rearrangement of the magnetic field and to enforce the con- 
straints given by the bond strengths. One field is the resid- 
ual bond strength (more commonly referred to as the residual 
capacity in the literature |[21;]). This residual interaction be- 
tween sites is denoted r^ . Initially, ry = rji = J for all 
nearest neighbor pairs {i, j), but in general need not equal 
Vji during the execution of the algorithm. This residual bond 
strength defines the paths along which excess magnetic field 
can be pushed. A site j is said to be reachable from a site i if 
there is a directed path from i to j with rui > for all bonds 
{k, I) along that path. 

For each site i, an excess field e; and a height field Ui 
(which is often called "distance" or "rank") are also defined. 
At the beginning of the algorithm, the excess, which can be 
positive or negative, is set equal to the random field strength, 
Ci = hi, and the height field Ui is set equal to the dis- 
tance to the nearest reachable site with negative excess. Sites 
from which no site with negative excess can be reached have 
Ui = oo. Sites with negative excess have Ui — 0. 

The PR algorithm maintains the height field Ui in a fashion 
designed to move the positive excess "downhill" (to smaller 
heights), i.e., towards sites with negative excess, where pos- 
sible. If it becomes impossible to rearrange excess towards 
a site with negative excess, the site is given a height label 
= oo (in practice, oo is represented by n, the number of 
spins). A site j is said to be accessible from i if the residual 
bond strength ry > and Ui = Uj + 1. Any site with posi- 
tive excess and height Ui < oo is active. (In the double queue 
method mentioned in Sec. lIIIBl negative excess sites can also 
be active.) 

The push operation moves excess from an active site i to an 
accessible neighbor j. Letting 5 = min(rij, e^), the excesses 
are updated by — 5, ej Cj + 6, while the residual 

bond strengths are updated according to r,;j — > — 6 and 
rji Tji + 5. If an active site has no accessible neighbor, so 
that no pushes are possible, it is relabeled: the height Ui is set 
to one greater than the height of the lowest neighbor j (i.e., 
minimal Uj over neighbors j) for which > 0. If no such 
neighbor exists, the height of i may immediately be raised 
to Ui = oo. We call the combination of all possible push 
operations from a single spin possibly followed by a relabel a 
PR step. 

The PR algorithm terminates when no active sites remain. 
The total number of PR steps needed to complete the algo- 
rithm is denoted by A^pr. The assignment of spin orienta- 
tions in the ground state is found by executing a global update 
(Sec. nil C> to finalize which sites have Ui = oo. Sites with 
Ui = oo are assigned Si = 1, while the remaining spins have 
Si = -1. 



B. Data structures 

We implemented the PR algorithm using four different data 
structures: a first-in-first-out queue (FIFO), a highest height 



priority queue implemented as a heap (HPQ), a lowest height 
priority queue (LPQ) also implemented as a heap, and a dou- 
ble FIFO queue (DFIFO) that treats positive and negative ex- 
cess symmetrically. (We also tried a stack or last-in first-out 
(LIFO) structure, but rejected it due to vastly longer running 
times at large A.) 

The FIFO structure is a list of active sites in the lattice. The 
site at the front of the list, i, is removed. Excess is pushed 
away from i if possible. If any inactive neighbors j of i are 
made active through a push operation, they are added to the 
end of the list. If there are no accessible neighbors, i is rela- 
beled. If i is still active after this PR step, it is also appended 
to the end of the list. 

The HPQ structure \27] is more complex than FIFO. Like 
the FIFO structure it contains a list of all active sites. The 
list is not organized by the temporal order in which sites have 
been treated, as would be the case in a FIFO queue, but by the 
height of the sites. The first site in the HPQ is always a site 
with maximal height. When we remove a site from the front 
of the queue, the site with the highest height that is still in the 
queue moves to the front. If we add a site with a larger height 
than any of the sites already in the queue, the new site moves 
to the front of the queue. If after a PR step a site is still active, 
it is re-added to the queue. Since a PR step relabels an active 
site by increasing its height by at least one before adding it 
back to the queue, such a site is still of maximal height and 
is added at the front. Thus the same site might be acted upon 
by the algorithm many times in a row. In practice, the HPQ 
is simply implemented by sorting the sites into bins given by 
their height values. 

The LPQ structure 12^ is exactly the same as the HPQ 
structure, except that the list is reversed, so that sites i with 
lowest height Ui are subject to PR operations. 

FIFO, HPQ, and LPQ enforce an asymmetry between pos- 
itive and negative excess. Why should one sign (positive ex- 
cess) be pushed around while the other (negative excess) re- 
mains static (except by rearrangement of positive excess onto 
a negative excess of lesser magnitude)? Certainly, the physi- 
cal problem is unaltered by the replacement hi --hi. Our 
fourth implementation treats positive and negative excesses 
on an equal footing. Negative excess then moves from lower 
heights (here, Ui < is a possible height label) to higher 
heights. We implemented this as two FIFO queues (DFIFO), 
one for positive and one for negative excess sites. Both queues 
are updated simultaneously. As noted in the next section, we 
also used two height fields, one for each sign of excess, when 
implementing DFIFO. 



C. Heuristics 

If these queues are adapted as described so far, the algo- 
rithm, though polynomial in n, is too slow to be practical for 
studying larger systems. Heuristics can be used to manipulate 
the height field to guide the push-relabel operations. Good 
heuristics are crucial to the practicality of the algorithm. 

All our implementations use the "global update" heuristic 
for initialization of the u;. A global update 111381 is used for 
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two purposes. It creates gradients in the height field from the 
sites with positive excess to the nearest site with negative ex- 
cess, allowing the excess fields to move efficiently towards 
annihilation. It addition, it identifies regions that are discon- 
nected from the rest of the spins and do not contain any neg- 
ative excess; such regions are labeled as spin-up regions and 
are removed from further consideration. Without this identi- 
fication it would take of the order of nr operations to mark a 
region of r spins as up. The global update is implemented as 
a breadth first search starting from the set of negative excess 
sites and takes of the order of n operations on a hypercubic 
lattice. During this search, the height of each vertex is set to 
the distance to the nearest reachable site with negative excess 
(sink). One important result of global updates is that local 
minima of the height field with positive height, often residu- 
als of former negative excess sites, are eliminated at sites with 
nonnegative excess. Following common practice, we choose 
the interval between global updates to be fixed, with a global 
update executed after every F push-relabel (PR) steps. 

The global update needs to be modified for the double 
queue (DFIFO) approach. In parallel with the height field for 
positive excess, based on a search from nodes with negative 
excess, it is natural to construct the height field for negative 
excess sites using positive excess sites as "sinks". This creates 
an ambiguity for sites with zero excess. Should their height be 
determined in relation to the positive or negative excess sites? 
We did not directly resolve this ambiguity, but instead used a 
scheme with separate height fields for the positive and nega- 
tive excesses, with separate relabeling and global updates. 

In addition, when using the HPQ data structure, we use gap 
relabeling I13ll . If there is a height Ug, such that no site has 
height u = Ug, but there are active sites with u > Ug, i.e., 
there is a gap in the set of heights, all sites with u> Ug are as- 
signed the maximum height. This reduces the need for global 
updates as active sites with u> Ug are disconnected from the 
sinks. When using HPQ, regions with larger height tend to 
have their height raised uniformly, leading to the possibility 
of creating a gap quickly. As the gap is a global tally, it is not 
efficient at detecting regions that have separated themselves 
from their surroundings locally, so gap relabeling is less useful 
in, e.g., FIFO, where updates are carried out without respect 
to height. To facilitate gap relabeling, the HPQ algorithm uses 
an array that contains the number of sites (whether active or 
not) at each possible height. A gap is created when the occu- 
pation number at a non-maximal height is reduced to zero. 



IV. COMPARISON OF IMPLEMENTATIONS 

To find the best available combination of data structure and 
heuristics for solving the Gaussian RFIM, we measured the 
running time of the algorithm for a variety of combinations. 
We measure the running time for the algorithm to find the 
ground state using both CPU time t in seconds and the to- 
tal number of PR steps N-p-R.. PR steps are the core operations 
of the algorithm and give us a machine independent measure 
of the improvement due to the heuristics. However, PR steps 
don't account for the time needed for the internal bookkeep- 



ing of the data structure, nor do they reflect the time needed 
for the heuristics, such as the global update. The CPU time 
t is therefore another useful measure of the performance and 
ultimately what we want to minimize. For the timing mea- 
surement we used SUN's Java Virtual Machine 1.4.2 on Dual 
IGHz PHI machines with 5 12MB of RAM running Linux. 

Here we summarize the results of our timing runs. We first 
describe our results for two-dimensional lattices. In this case 
there is no phase transition for the RFIM ground states at finite 
A. However, as the correlation length ^ diverges very rapidly 
as A decreases, there is a crossover disorder value A^(i) at 
which ^ exceeds the linear system size L. For A < /S.x{L), 
the 2D system is effectively ferromagnetic. The dependence 
of Aj; on L is very slow llTlnSlhgr. In our simulations, we 
find A;r(8) ~ 1.7and Aa.(4096) « 0.55. In three dimensions, 
there is a true transition at A = Ac « 2.27 fl4ll . 

In this section, we compare the timings for FIFO and HPQ 
structures and then present results for DFIFO. (Results for 
LPQ are included in the discussion of the results for d = 3, 
near the end of this section.) When seeking to minimize the 
running time, there is a competition between frequent global 
updates (small F), which improve the efficiency of the PR 
steps and infrequent updates (large F), which save the cost 
of performing a global update. We first determine the global 
update interval that minimizes the CPU time over all choices 
of A. Generally, for all data structures, we find a minimum in 
t for Fniin ~ n (or at least that performance is not improved 
for other F). The FIFO and HPQ structures then give compa- 
rable results away from the crossover A w A^:, though FIFO 
is preferable at small A and HPQ is faster at large A. DFIFO 
takes significantly more time than either HPQ or FIFO to find 
the ground state for A on the order of A^^- We also find that 
the discreteness of global updates can lead to plateaus in the 
number of global updates Nq as a function of F. The 3D re- 
sults are generally consistent with our results for 2D lattices. 
We find that LPQ always performs at least as well as HPQ and 
is faster for A > Ac. 



A. Choosing the global update interval in two dimensions 

We first examine the effect of varying the update interval 
F for two dimensional samples. Global updates are only use- 
ful if they reduce the number of PR steps needed to find the 
solution. As we mentioned in Sec. IIII CI global updates are 
expensive, taking of the order of n operations, and should 
therefore not be performed too frequently. To find the optimal 
update interval r,„i,i, we varied F for each data structure over 
several orders of magnitude for various values of the random 
field strength A and examined how both A^pr, the number 
of push-relabel cycles used to find the ground state, and the 
running time t varied. 



I. Global update inten'al for FIFO 

Immediately after a global update, push operations are 
guaranteed to move excess towards the nearest sinks. After 
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a global update, however, some sinks are soon annihilated by 
excess pushed into them. If a sink at site s is annihilated, the 
height field around s no longer indicates the shortest distance 
to a sink; the height field still slopes towards s although there 
is no longer a sink. The extent of this "misleading" height 
field depends on the density of sinks and is limited by the 
distance between sinks. When F is small, global updates are 
frequent enough that the height field generally leads positive 
excess towards a sink. As long as this is the case, increas- 
ing the frequency of global updates does little to reduce A^pr. 
Therefore, for small F, we expect A^pr to be almost indepen- 
dent of F for FIFO. 

Fig. [0 displays algorithm costs Npji and t for the case 
A = 2.2, which exceeds A^;. The expectation that A^pr is 
independent of F at small F is confirmed by the plot of the 
mean of A'pr as a function of F displayed in Fig. [Ha). As we 
increase F, a larger and larger fraction of the height field be- 
comes incorrect between global updates: more minima of the 
height field no longer contain sinks. The plot shows that A'pr 
starts to increase significantly above F « O.I71. To minimize 
the running time t, we need to balance the cost of additional 
global updates with the reduction in A'pr. A global update 
takes of the order of n operations and, in fact, Fig.^^b) shows 
a minimum in t vs. F at Fmin ~ n. To verify our assumption 
that Fmin ~ n, we rescaled the curves in both Fig. [Qa) and 
(b) by dividing F, Apr, and t by n. The collapse of the data 
verifies the minimum in t vs. F at Tnun ~ n and shows that 
the running time of the algorithm scales nearly linearly with 
nin 2D (Fig. at fixed A. 



2. Detailed look at the FIFO data 

While determining the optimal global update interval for 
the FIFO structure, we noticed two features of interest in the 
data. These features are the tendency for A^pr to be an integer 
multiple of F and, at small A, a separation of the mean run 
times between positively and negatively magnetized samples. 

A close look at the data reveals piecewise linear behavior 
in the average running time of the algorithm, when measured 
by A'pR, as displayed in Fig.|3 These linear regions are con- 
sistent with A^pR = fcF, for integer k, as shown by the dashed 
lines in the figure. These linear regions coincide with plateaus 
in the number of global updates executed during the solution, 
Nc, when plotted vs. F (see the inset in Fig.l^. The plateaus 
in A'g' vs. F reflect the effect of the global update, which 
causes large changes in the height field and can bring the aux- 
iliary fields close to a solution. Note that a ground state is 
found by the algorithm when all of the positive excess is con- 
fined to regions that are isolated from sinks. This isolation is 
due to saturated bonds that block the rearrangement of flow 
(and to the cancellation of positive excess with negative ex- 
cess in regions accessible to the sinks). The algorithm will 
not terminate until the blocked-off regions have their heights 
raised to Ui = n. Without global updates (F = 00), there 
is a separation between the times when the push-relabel op- 
erations effectively determine the domain boundaries by sat- 
urating the appropriate bonds and the time when the relabel 
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Figure 1: Mean running time for finding the ground state of a 2D 
RFIM system vs. F using FIFO and A = 2.2, where the correlation 
length is much smaller than the sample sizes used, <^ L. For 
clarity in this figure and other figures in this section, statistical error 
bars are not shown, but are consistent with the apparent deviations 
from smooth curves. The figure shows (a) the number of PR steps 
A^PR used to find the ground state and (b) CPU time t[s] as a function 
of the global update interval F for L = 25, 50, 100, 200, and 400. 
For small F, A^pr is nearly independent of F showing that frequent 
global updates are unnecessary. At F somewhat less than O.ln — 
O.IL^, A^pR starts to increase. The minimum in the CPU time t is 
at Fmin ~ n, where the change in time needed for the additional PR 
steps balances the change in time needed for the global update. 



operations identify the up-spin regions. When F is finite, but 
larger than the number of PR steps needed to find all of the do- 
main boundaries and smaller than the total number of PR steps 
needed to terminate the algorithm, the first global update ef- 
fectively terminates the running of the algorithm and Nq ~ 1. 
There is another interval at smaller values of F where one 
global update is executed before the domain boundaries are 
determined and the second global update terminates the algo- 
rithm, giving Ng ~ 2. This pattern continues to higher values 
of m. We emphasize that the data presented is averaged over 
100 samples at each value of F; the data therefore indicates 
that the fluctuations in the locations of the linear regions are 
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Figure 2: Scaling of PR steps A*'pr and CPU time t with system 
size n for FIFO (A = 2.2). (a) The average number of PR steps 
per site, Npr/u, as a function of number of push-relabel steps per 
site per global update, T/n. The data collapse is consistent with the 
number of operations scaling with n (the best that can be expected 
for ^ <C L). (b) CPU time t per site (t/n) vs. T/n. The values along 
both axes have been divided by n the number of sites. A fair collapse 
develops for L > 100. The minimum is at Tmin/n ~ 1, independent 
of system size. The minimum is shallow, however, and even missing 
Fmin by a factor of ten increases the running time only by a factor of 
about two. 



quite small. 

We also noted a strong up-down asymmetry in the running 
time at small A. For A ^ A^; and F = n, it takes about 
50% longer to find the solution if the ground state is spin-up 
than if it is spin-down. For F ^ oo the ratio of mean run- 
ning times becomes very large. At small A, the ground state 
is determined by the sum over all random fields. If the sum 
is negative, the algorithm is done as soon as all the positive 
excess fields have been annihilated, but if the sum is positive 
all the sites with remaining excess fields must be moved up to 
height n before they are labeled inaccessible. This local re- 
labeling can take of the order of steps, as essentially each 
site must be moved stepwise to the maximal height. When a 



Figure 3: Plot showing the piecewise linear behavior of ATpR, the 
mean number of PR steps needed to find the ground state, when plot- 
ted as a function of F, the interval between global updates, for FIFO 
in d = 2. The parameter values are L — 200 and A = 2.2 and 
the data is averaged over 100 samples. A'^pr increases linearly over 
intervals in F, A'^pr = fcF, with the curves for k = 1, 2, 3, . . . in- 
dicated by the dashed lines. The large changes caused by the global 
update operation can lead to plateaus in the number of global updates 
Ng vs. F, as shown in the inset. 



global update is included, the height label of all sites is set to 
its maximum value once no more sites with negative excess 
exist. As the global relabeling takes only of the order of n 
steps, this makes the running times for up- and down- magne- 
tized samples more similar, to within 0{n) in total magnitude, 
forF = 0{n). 



3. Global update interval for HPQ 

For A < Aa:, the behavior of the timing for HPQ vs. F 
is very similar to the behavior of the timing for FIFO, with 
Topt ~ n. In contrast, in the paramagnetic regime, gap re- 
labeling detects enclosed domains efficiently and quickly re- 
duces the number of active sites. For large A, then, frequent 
global updates (F < n) quickly dominate the running time t. 
For F > n, < becomes independent of F (Fig.|4}, as global up- 
dates are not executed before the solution is found. Consistent 
with the optimal behavior for small A and the independence 
of t from F at large A, we will use Fopt ~ n for the HPQ 
structure. As F = oo is a reasonable choice for larger A, we 
will sometimes include this choice for comparison. 



B. FIFO vs. HPQ in two dimensions 

Now that we have found an optimal value for F for each 
data structure, we can directly compare the performance of 
FIFO and HPQ for various A. We compare timings using 
F = n for FIFO and both F = 72 and F = cx) for HPQ. 
We will refer to the latter two choices as HPQ„ and HPQoo, 
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Figure 4: CPU time t[s] for HPQ with gap relabeling vs. F for 
A = 2.2 > Ax- In this regime gap relabeling is effective and global 
updates are almost unnecessary. For T < n, the running time in- 
creases due to frequent global updates. For F > n, the running time 
stays constant. There is no discernible minimum in the t vs. F curve, 
but r — n does not hurt performance at large A and improves per- 
formance for smaller A. 



respectively. We computed the mean values of t and A^pr for 
a large range of system sizes and disorders. The running times 
were averaged over at least 100 (sometimes as many as 10^) 
samples, chosen independently for each value of L and T. 

Fig. |5] displays the running time per spin for FIFO with 
T = n and Fig. |6] presents a comparison of all three com- 
binations for L = 100. The first notable feature in the data is 
that all three combinations show a pronounced peak in iVpR 
at a value of (L) between 0.5 and 2.0 over the range L = 8 
to 4096. This peak is reminiscent of the critical slowing down 
near a phase transition. It moves slowly to the left with in- 
creasing system size (Fig.|5|i, consistent with A.j;{L) decay- 
ing with increasing L. As noted in Sec. mi there is no phase 
transition in 2D, but there is a crossover for any finite size sys- 
tem from a ferromagnetic regime at low A to a paramagnetic 
regime at large A as the correlation length becomes smaller 
than the system size. 

For small A, HPQ„ and FIFO perform very similarly. 
HPQoo, on the other hand, is 2 to 4 times slower in this regime. 
Near the crossover, the running times for HPQ„ and FIFO 
start to deviate and the difference is largest at A^;, where 
HPQn is about 3 times slower than FIFO and HPQoo is yet 
another factor of 3 slower than HPQ„ . 

For A > A^., FIFO starts to lose its advantage as domain 
sizes become smaller and gap relabeling becomes more and 
more effective with increasing A. When A « 3, HPQ„ starts 
to outperform FIFO; when A w 20, FIFO takes twice as many 
push operations as HPQ„. 



C. Timings for the double Queue (DFIFO) 

Our method for simultaneously rearranging both positive 
and negative excess, DFIFO, performs well for small A in 
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Figure 5: [Color online] Number of push-relabel operations per site 
needed to solve for the ground state, Npn/n, plotted as a function 
of A, for a 2D square lattice with Gaussian disorder. The number of 
spins n — ranges from 8^ to 4096^ . Due to the large range of 
system sizes and hence running times, we have used a logarithmic 
axis for plotting ATpa/n. Statistical error bars and individual points 
are not displayed, as the point spacing is small and the statistical 
uncertainties are very small, except for L = 4096. The global update 
interval used was F = n. Plateaus are seen when A'pr is close to a 
multiple of F, especially at larger A. Each curve show a pronounced 
peak, which can be used to define Ax{L). At this value of A, the 
correlation length is of the order of the system size and the system 
crosses over from a ferromagnetic regime with one large domain to 
a paramagnetic regime where the correlation length is smaller than 
the system size. The crossover field A^ approaches zero as L goes 
to infinity, but the approach is logarithmically slow. 
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Figure 6: [Color online] Number of PR steps A'pr in d = 2 for 
L = 100. This plot shows the timing for HPQ with (F = n) and 
without (F = cxd) global updates, compared with FIFO. Note the 
significant improvement the global update gives to HPQ for field 
strengths below and at the finite-size crossover.FIFO needs fewer 
push operations than HPQr=cx) over almost 3 orders of magnitude 
in A. This difference becomes more pronounced as the system size 
increases. 
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d = 2 but never better than FIFO. For large values of A, it 
is much slower than FIFO and HPQ. DFIFO performs even 
worse in 3D. The reason became clear when we visualized the 
rearrangements of excess field. The visualization shows that 
positive and negative excesses tend to miss each other The 
potential landscape, given by the two height fields, is not up- 
dated when excess is pushed and is not consistently updated 
by relabels. The rearrangement of excess is generally guided 
by the last global update: the negative excess is pushed to 
where the positive excess was at the time of the last global 
update, and vice versa. This lack of coordination between the 
two height fields becomes more pronounced in higher dimen- 
sions, where there are more possible paths between sites. It is 
possible that using some other combination of the height fields 
for a heuristic would produce better results. One alternative, 
for example, would be to combine the separate height fields 
for positive and negative excesses into a single field. Positive 
excess would move down the height field gradient and nega- 
tive excesses would move up the same gradient. 



D. Results for three dimensions (d = 3) 

In three dimensions, T = n remains a good choice for both 
FIFO and HPQ. As in the case d — 2, due to a very broad 
minimum in the dependence of t on F, the exact value of T is 
not critical. This is in agreement with previous results where 
rmin = n has been found to be a good choice for a variety of 
sparse and dense graphs ITsl. 

The peak in the timing of the algorithm is quite pronounced, 
as in the 2D data, but in this case the location of the peak 
converges to a fixed value at large L. We gathered data for 
the number of push-relabel operations iVpR per site, iVpR / n, 
vs. A, for sizes L ranging from 8 to 128 for HPQ and LPQ 
and up to size 256 for the FIFO data structure. The plot of the 
FIFO data (Fig.Q shows a growth in Np-^ /n with L and con- 
vergence to a well-defined curve for the running time for A 
in the paramagnetic range. In the ferromagnetic regime, there 
is a slow growth of the running time with sample dimension 
L. Both the LPQ and HPQ data structures are significantly 
slower than the FIFO data structure, at moderate values of A 
(with a ratio of sa 3 for the peak running times at L = 128), as 
seen in Fig.|8l The LPQ data structure is significantly faster 
than the HPQ structure on the paramagnetic side of the peak 
in the running time, but is very similar in speed on the fer- 
romagnetic side. The running time for LPQ converges much 
more quickly than the HPQ version to an L-dependent value 
as L is increased at fixed A > A^. 
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Figure 7: [Color online] Push-relabel operations (cycles) per site, 
NpYiJn, plotted vs. disorder strength A in d = 3, for L = 8, 16, 
32, 64, 128, and 256, for the push-relabel algorithm using the FIFO 
data structure. The global update interval is fixed at F = n. There 
is a clear peak in the running time near A Ri 2.3, which is in good 
agreement with the critical field Ac = 2.270(5) found O for the 
ferro- to paramagnetic phase transition for a Gaussian distribution. 
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Figure 8: [Color online] Push-relabel operations (cycles) per site 
iVpR/n vs. A in d = 3, for L = 8, 16, 32, 64, 128, and 256, 
when using the HPQ and LPQ data structures. There is again a clear 
peak in the running time near A « 2.3, but the peak is significantly 
broader and higher for HPQ, than for LPQ. Note that the number of 
cycles grows more quickly with L than for the FIFO structure. 



V. QUALITATIVE DESCRIPTION 

In order to better understand the timing results, we have 
visualized the evolution of the height fields and the rearrange- 
ment of excess. The visualization code |26] uses a color map 
to display the height field. The program has the option to dis- 
play the location of sites with excess (white for positive and 
black for negative) and to indicate where the flow is saturated. 



i.e., where ?'y = 0. A sample snapshot from the simulation 
with both of these options activated is shown in Fig.|5] 

When A is somewhat larger than A^; in d = 2, the differ- 
ences between the temporal progress using the HPQ and FIFO 
data structures are clearly seen in the dynamic visualization. 
As FIFO cycles through all active sites, all the positive ex- 
cesses move at a roughly uniform speed down the height gra- 
dients. The utility of the global update at late times for FIFO 
is apparent: when F is large (infrequent global updates), a few 
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Figure 9: [Color online] Image of the state of the auxiliary fields 
during the solution of a 2D RFIM of size L = 50 and with disorder 
strength A = 2. This picture shows a snapshot of the height field Ui, 
the location of non-zero excess e{i), and bonds with zero residual 
strength (r^j = 0), while executing the push-relabel algorithm (HPQ 
data structure). The red (darkest non-black) regions have already 
been identified as up; the heights in this region have the maximal 
value n = 2500. The colors that are not black or white correspond 
to the height field in the remainder of the sample: blue (darker colors) 
indicates a low, while green (lighter colors) indicates a high height. 
The white squares represent spins with a positive excess field (the 
height at these sites is not indicated); black squares represent spins 
with a negative excess (and have height zero). The black lines are 
dual to the bonds that are saturated (i.e., have zero residual strength). 
These saturated dual bonds (r^j = or rji = 0) "block" the rear- 
rangement of excess e^. More up-spin regions may be identified as 
more bonds are saturated and the dual bonds link up to isolate a re- 
gion. The algorithm terminates when all positive excess is confined 
to up-spin regions. 



positive excesses are seen to skate around in regions contained 
by saturated bonds. The pushes have found the minimum cut 
by saturating the bonds that separate the positive and negative 
spin regions, but the algorithm hasn't confirmed that fact yet. 
The local relabels, which tend to raise a site's height only by 
one step at a time, are inefficient in raising a large up-spin 
domain to maximal height. The remaining positive excesses 
are shuffled around within the domain, slowly increasing the 
height by small relabels. When global updates are infrequent, 
the algorithm may terminate by a final global update, which 
finds that a set of positive excesses is isolated from all sinks. 
This confirms the picture discussed in Sec IIV A7\ HPQ, on 
the other hand, tends to act repeatedly on the same site. Iso- 
lated regions with positive excess are raised uniformly above 
the rest of the sample. This allows the gap heuristic to quickly 
identify isolated positive spin domains, even when F is large. 



For samples with weak disorder, the algorithm with the 
HPQ data structure does lead to a few positive excesses rac- 
ing around the network while the others sit idly. The time 
to raise a large positive domain high enough to create a gap 
is large. Again, active sites tend to move around quite a bit, 
while other regions remain unchanged. Towards the middle 
of the algorithm execution, the lattice often displays lines or 
rings of positive excess sitting on an equipotential line near 
a sink, often immediately adjacent to it. These structures are 
at low height and cannot move until all excesses of greater 
height have been removed. This should slow the algorithm 
down. The positive excesses do not "screen" the sinks, since 
the global update doesn't differentiate between sinks of dif- 
ferent capacity. This expectation is consistent with our results 
for the LPQ, which performs somewhat better than HPQ, es- 
pecially at higher field strengths. 

At large field strengths, the proportion of active sites whose 
random field strength is greater than the strength of their 
bonds to neighbors (i.e., A > 4 in 2D) is large. The algo- 
rithm acts on most sites only twice. One can clearly see how 
HPQ sweeps the lattice and raises (nearly) all the sites of ini- 
tial height two (those with no adjacent sinks) to the maximum 
height, then does the same to sites of height one. FIFO also 
generally makes two or three passes, though its first pass ex- 
ecutes pushes on the sites where pushing is possible and rela- 
bels otherwise. 



VI. SUMMARY 

As has been demonstrated before OQIlSSElllllIll 
I34I1 . the PR algorithm is an efficient method to find the exact 
ground state for the RFIM at T=0 in any dimension. Here 
we investigated how to implement PR to find the ground state 
most quickly. We compared a number of data structures and 
the effect of global and gap relabeling on the performance of 
the algorithm. 

In agreement with previous work jl34ll, our detailed results 
recommend the FIFO-queue combined with global updates 
every F « fcn steps, with k a number near unity. FIFO 
performs much better near the crossover disorder (A^; for 
d = 1,2) and the critical disorder (Ac in d = 3) and never 
performs significantly worse than HPQ. The exact value of T 
is not crucial since the minimum in t vs. F is very broad. 

We also tried an implementation that treats positive and 
negative excesses equally. This implementation, however, suf- 
fers from the lack of coordination between the height fields for 
the two sets of excesses and positive and negative excesses 
tend to miss each other It is likely that this algorithm could 
be improved by using a single height field to coordinate the 
motion of the positive and negative excess. 

A more flexible approach to the global update interval 
might also be useful in speeding up simulations. It is likely 
that adaptively modifying the global updates so that they are 
executed when the sink density changes by a defined fraction 
or packets of excess have travelled a given distance would op- 
timize the algorithm during each stage of the solution. There 
is still a lot of room for other modification, e.g., cutting off 
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the breadth first search to reflect saturation and non-uniform 
intervals for global update that depend on the number of sinks 
that have been annihilated since the last global update rather 
than the number of PR steps. 

We found that the visualization of the operations (see l26ll 
for the source code) greatly improved our understanding of 
the algorithm. This code may be useful in suggesting further 
improvements to the algorithm. 
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